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Abstract 



, An important part of problems in statistical physics and computer 

o 



science can be expressed as the computation of marginal probabilities 
over a Markov Random Field. The belief propagation algorithm, which 
is an exact procedure to compute these marginals when the underly- 
ing graph is a tree, has gained its popularity as an efficient way to 
approximate them in the more general case. In this paper, we focus 
on an aspect of the algorithm that did not get that much attention 
' in the literature, which is the effect of the normalization of the mes- 

, sages. We show in particular that, for a large class of normalization 

strategies, it is possible to focus only on belief convergence. Following 
this, we express the necessary and sufficient conditions for local sta- 
bility of a fixed point in terms of the graph structure and the beliefs 
values at the fixed point. We also explicit some connexion between the 
normalization constants and the underlying Bethe Free Energy. 

1 Introduction 

H , 

I We are interested in this article in a random Markov field on a finite graph 

with local interactions, on which we want to compute marginal probabili- 
ties. The structure of the underlying model is described by a set of discrete 
variables x = {xi, i € V} € {1, ... , q}^ , where the set V of variables is linked 
together by so-called "factors" which are subsets a C V of variables. If F is 
this set of factors, we consider the set of probability measures of the form 



p(x) = []<Ai(Xi)nV'a(Xa), (1.1) 



iev aeF 
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where Xq = {xi,i € a}. 



F together with V define the factor graph Q ( Kschischang et aP . boOll ). 



that is an undirected bipartite graph, which will be assumed to be connected. 
We will also assume that the functions tpa are never equal to zero, which 
is to say that the Markov random field exhibits no deterministic behavior. 
The set E of edges contains all the couples (a, i) G F x V such that i £ a. 
We denote da (resp. di) the degree of the factor node a (resp. of the variable 
node i), and C the number of independent cycles of Q. 

Exact procedures to compute marginal probabilities of p generally face 
an exponential complexity problem and one has to resort to approximate 
procedures. The Bethe approximation, which is used in statistical physics, 
consists in minimizing an approximate version of the variational free en- 
ergy associ ated to (11.11) . In computer science, the belief propagation (BP) 



algorithm (jPearll . Il988l ) is a message passing procedure that allows to com- 
pute efficiently exact marginal probabilities when the underlying graph is 
a tree. When the graph has cycles, it is still possible to apply the proce- 
dure, which converges with a rather good accuracy on sufficiently sparse 
graphs. However, there may be several fixed points, either stable or un- 
stable. It has been shown that these fixed po i nts co incide with stationar 



stable. It nas been snown tnat tnese nxed po i nts co incide witn st ationary 
points of the Bethe free energy (lYedidia et al.l . I2OO5I I. In addition ffleskesl . 



2nn.4 IWatanabe and Fukumizul. 



20091 ). stable fixed points of BP are local 



minima of the Bethe free energy. We will come back to this variational point 
of view of the BP algorithm in Section [6l 

We discuss in this paper an aspect of the algorithm that did not get that 
much attention in the literature, which is the effect of the normalization 
of the messages on the behavior of the algorithm. Indeed, the justification 
for normalization is generally that it "improves convergence". Moreover, 
different authors use different schemes, without really explaining what are 
the difference between these definitions. 



The paper is organized as follows: the BP algorithm and its various nor- 
malization strategies are defined in Section [2j Section [3] deals with the effect 
of different types of messages normalization on the existence of fixed points. 
Section m is dedicated to the dynamic of the algorithm in terms of beliefs and 
cases where convergence of messages is equivalent to convergence of beliefs; 
moreover, it is shown that normalization does not change belief dynamic. 
In Section [5l we show that normalization is required for convergence of the 
messages, and provide some sufficient conditions. Finally, in Section El we 
tackle the issue of normalization in the variational problem associated to 
Bethe approximation. New research directions are proposed in Section [71 
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2 The belief propagation algorithm 

The behef propagation algorithm (jPeari Il988l ) is a message passing pro- 
cedure, which output is a set of estimated marginal probabilities, the be- 
liefs ba{xa) (including single nodes beliefs bi{xi)). The idea is to factor the 
marginal probability at a given site as a product of contributions coming 
from neighboring factor nodes, which are the messages. With definition 
(jl.ip of the joint probability measure, the updates rules read: 



m, 



a^i{Xi) ^ ^Va(Xa) Uj^aixj), 
jea\i 



Ui^aiXi) = (l)i{Xi) Yi rUa'-^iiXi), 



(2.1) 
(2.2) 



where the notation Y^^^ should be understood as summing all the variables 
Xi, i G s C V, from 1 to q. At any point of the algorithm, one can compute 
the current beliefs as 



h{xi) = ^ (j)i{xi)Y\ rua^iixi), 
Zi[m) 



a3i 



Za[m) 



(2.3) 
(2.4) 



where Zi{m) and Za{m) are the normalization constants that ensure that 

J^6,(Xi) = l, ^fea(Xa) = l. (2.5) 

These constants reduce to 1 when is a tree. 

In practice, the messages are often normalized so that 



nia^iixi) = 1. 

Xi = l 



(2.6) 



However, the possibilities of normalization are not limited to this setting. 
Consider the mapping 



Qai,a:,(m-) = V'a(Xa) ]"[ 4>j{Xj) f| ma'^j{xj) 



(2.7) 



A normalized version of BP is defined by the update rule 

6aj,x,(m) 



"f^a^iiXi) 



Zai{rh) 
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where Zai{fh) is a constant that depends on the messages and which, in the 
case of ()2.6p . reads 

^aT"('^) = E0-,x.(m). (2.9) 

x=l 

In the remaining of this paper, ()2.1|2.2p wih be referred to as "plain BP" 
algorithm, to differentiate it from the "normalized BP" of ()2.8p . 



Following IWainwrightl (j2002l ) , it is worth noting that the plain message 



update scheme can be rewritten as 



Z(j(m)5ju(xj) 



where we use the convenient shorthand notation 



<-a\i 



Th is suggests a different type of normalization, used in particular bv lHeskes 
\2m± . namely 



which leads to the simple update rule 

^ija(^i) 

rha^i{xi) i rha^iixi). (2.12) 

bi[Xi) 

The following lemma recapitulates some properties shared by all normal- 
ization strategies at a fixed point. 

Lemma 2.1. Let rh he such that 



The associated normalization constants satisfy 

Za^{rh) = f^, Vai G E, (2.13) 
Zii[m) 

and the following compatibility condition holds. 

Y,baM = hixi). (2.14) 



In particular, when Zai = 1 (no normalization), all the Za and Zi are equal 
to some common constant Z . 
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Proof. The normalized update rule ()2.8p . together with ()2.3p ~^ ()2.4p . imply 



Za 



By definition of Za and Zj, ba and bi are normalized to 1, so summing this 
relation w.r.t Xi gives ()2.13p and the equation above reduces to ()2.14p . ■ 

It is known (|Yedidia et al.l . |20oi) that the belief propagation algorithm 
is an iterative way of solving a variational problem, namely it minimizes 
over b the Bethe free energy F{b) associated with (jl.ip . 



Fib) y: bai^'^) ttH + E bd^^) ^'^"^'^'''^ 



4>i{Xi) 



(2.15) 



Writing the Lagrangian of the minimization of ()2.15p with b subject to 
the constraints ()2.14p and (j2.5p . one obtains 

C{b,X,j) = F{b)+Y ^ai{xi){b,{xi)- Y ba{^a))-Y^i{Y.biix^)-'i)■ 



The minima are stationary points of C{b, A, 7) which correspond to 

V'a(Xa) 



j&a bBj,b=^a 



1 -I— r 

bi{xi) = 4>i{xi) exp(— — - - 7j) Y[ rua^iix-i), G V 

091 



with the (invertible) parametrization 

Xai{Xi)= log JJ mh^i{xi), 
b5i,by^a 

Enforcing constraints (|2.14p yields the BP fixed points equations with nor- 
malization terms 7^. We will return to this variational setting in Section [6j 



3 Normalization and existence of fixed points 

We discuss here an aspect of the algorithm that did not get that much 
attention in the literature, which is the equivalence of the fixed points of the 
normalized and plain BP fiavors. 

It is not immediate to check that the normalized version of the algorithm 
does not introduce new fixed points, that would therefore not correspond to 
true stationary points of the Bethe free energy. We show in Theorem 13.21 
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that the sets of fixed points are equivalent, except possibly when the graph 
Q has one unique cycle. 



As pointed out bv lMooij and KappenI (j2007l ). many different sets of mes- 
sages can correspond to the same set of beliefs. The following lemma shows 
that the set of messages leading to the same beliefs is simply constructed 
through linear mappings. 

Lemma 3.1. Two set of messages m and m! lead to the same beliefs if, and 
only if, there is a set of strictly positive constants Cai such that 

Proof. The direct part of the lemma is trivial. Concerning the other part, 
we have from (|2.3p and (|2.4p 

ba{-Ka)Za{m) -TT -TT , . 
Ti ^ =11 11 ^h^j{Xj) 

bi{xi)Zi{m) -pj 

^ YYma^i{xi). 



Assume the two vectors of messages m and m' lead to the same set of 
beliefs b and write ma^i{xi) = Cai^x^ ^'a^i{xi). Then, from the relation on 
bi, the vector c satisfies 



ma^i{xi) _ Z^{m) dof 

a3i a3i 

Moreover, we want to preserve the beliefs ba- Using ()3.ip . we have 



nY-rma^i{Xi) /jt{m} dof 



Since Vi (resp. Va) does not depend on the choice of Xi (resp. x^), ()3.2p 
implies the independence of Cai^Xi with respect to Xj. Indeed, if we compare 
two vectors x^ and xjj such that, for alH G a \ j, = Xj, but x'^ ^ Xj, then 
Caj,xj = Caj^x'^ ) which concludes the proof. 
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3.1 Prom normalized BP to plain BP 

We show that in most cases the fixed points of a normalized BP algorithm 
(no matter the normalization used) are associated with fixed points of the 
plain BP algorithm. Recall that C is the number of independent cycles of 

g. 
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Theorem 3.2. A fixed point rh of the BP algorithm with normalized mes- 
sages corresponds to a fixed point of the plain BP algorithm associated to 
the same beliefs iff one of the two following conditions is satisfied: 

(i) the graph Q has either no cycle or more than one (C ^ 1); 

(a) C = 1, and the normalization constants of the associated beliefs are 
such that 

^ ^ m)i-* = 1. (3.3) 



J]Z„(m)J]Z, 



Proof. Let m be a fixed point of (j2.8p . Let us find a set of constants Cai such 
tliat ma^i{xi) = Caifha^i{xi) is a non-zero fixed point of ()2.1l I2.2p . Using 
Lemma |3. 11 we see that m and fh correspond to the same beliefs. We have 



n n - 

'jea\i a'3j,a'j^a 

n n 

'j&a\i a'3j,a'j^a 
1 



Ca'j 



n n ' 



Qai,x,{'m) 



and therefore 



log Cai ^ log Ca'j = log Zai. 

This equation is precisely in the setting of Lemma lA.2| given in the Appendix, 
with Xai = log Cai and i/ai = log Zai = log Za - log Zi. It a lways has a 
solution when C ^ 1; when C = 1, the additional condition (jA.SP is required, 
and (|3.3p follows. ■ 

There is in general an infinite number of fixed points m corresponding 
to each rh. However, as noted at the beginning of the section, this is not a 
problem, since all these fixed points correspond to the same set of beliefs. 
In this sense, normalizing the messages can have the effect of collapsing 
equivalent fixed points. 

When C = 1, it is known (IWeissl . l2000l ^ that normalized BP always 
converges to a fixed point. However, the theorem above states that there 
may be no basic fixed point m corresponding to a given rh. 

It is actually not difficult to see what happens in this case: assume a 
trivial network with two variables and two factors a = 6 = {1, 2} and assume 
for simplicity that 0i = (^2 = 1- The equations for the BP fixed point boil 
down to relations like 



rUa^liXi) = y^^'lpa{xi,X2)mb^2{x2), 



X2 
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or, with a matrix notation, 



Therefore, the matrix ^a^b necessarily has 1 as an eigenvalue. Since 
this is not true in g eneral, there can be no fixed point for basic BP. In 
the normalized case, Iweissi (|2nnnl l shows that BP always converges to the 



Perron vector of this matrix. We know there is an infinite number (not 
even countable, see Lemma l3.ip set of messages corresponding to the same 
beliefs. 

It is possible that the behavior of the algorithm leads to convergence of 
the beliefs without the convergence of messages as the case C = 1 suggests. 
Indeed, the plain BP scheme is then a l inear dy n amica l system which can 



converge to a subspace as described in iHartfiell (jl997l ). We will describe 



more precisely this kind of behavior in Section HI 
3.2 From plain BP to normalized BP 

It turns out that there is no general result about whether a plain BP fixed 
point is mapped to a fixed point by normalization. In this section, we will 
thus first examine the case of a fairly general family of normalizations, and 
then look at two other examples. 

Definition 3.3. A normalization Zai is said to be positive homogeneous 
when it is of the form Zai = ^ai ° ©ai, with Nai : M'^ i— ?> M positive 
homogeneous functions of order 1 satisfying 

NaiiXrUa^i) = XNai{ma^i)yX > 0. (3.4) 

NaiirUa^i) = <J=^ TUa^i = 0. (3.5) 

The part of ()3.5p is obviously implied by ()3.4p . A particular 

family of positive homogeneous normalizations is built from all norms Nai 
on M'^. These contain in particular the normalization Z^'^^^(ni) ()2.9p or the 
maximum of messages 



It is actually not necessary to have a proper norm: IWatanabe and Fukumizu 

use a scheme that amounts to 



The following proposition describes the effect of the above family of 
normalizations. 

Proposition 3.4. All the fixed points of the plain BP algorithm leading 
to the same set of beliefs correspond to a unique fixed point of a positive 
homogeneous normalized scheme. 
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Proof. Let m be a fixed point of the plain BP sclieme. Using Lemma l3.H 
a fixed point m of tlie normalized scheme associated with the same beliefs 
than m is such as 

rha^iiXi) = Cainia^iiXi). (3.6) 

Since Q is multi-linear, 



mj 



and, using ()3.4p . 



\jea\i d5j,d^a j 

Zai{m) Zai{m) 



Therefore, fh is determined uniquely from m. Since rh is clearly invariant for 
all the set of messages m corresponding to the same beliefs (see Lemma [3.ip . 
the proof is complete. ■ 

In order to emphasize the result of Proposition 13.41 it is interesting to 
describe what happens with the belief normalization Z^^^ ()2.1ip . We know 
from Lemma |2 . 1 1 that . for any normalization, we have at any fixed point 

r7 I \ ZaKJTLj dcf ^bel/ \ 

Zi[m) 

Therefore, any fixed point of any normalized scheme (even of the plain 
scheme) is a fixed point of the scheme with normalization Z^^^. We see 
the difference between this kind of normalization and a positive homoge- 
neous one. While the latter collapses families of fixed points to one unique 
fixed point, Z^^^ instead conserves all the fixed points of all possible schemes. 

To conclude this section, we will present an example of a "bad normal- 
ization" to illustrate a worst case scenario. Consider the following normal- 
ization 

Z..im) = E^^-M . 

sup^. nia^iix) 

This normalization, which is not homogeneous at all, defines a BP algorithm 
which does not admit any fixed point. Following the proof of Proposition [331 
let TTi be a fixed point of normalized BP associated with a plain fixed point 
m through p.6p . then 

( ^ _ Qai,x,{rn) _ Tha^ijXi) 

1T^a^i[Xi) — ry I ~ \ ~ ry I \ 

Zai{m) Zai{m) 
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Indeed it is easy to check that 

Since for any fixed point m of the plain update we have Zai{m) > 1, no mes- 
sage fa can be a fixed point for this normahzed scheme. Using Theorem 13.21 
we conclude that this scheme admits no fixed point. 



4 Belief dynamic 

We are interested here in looking at the dynamic in terms of convergence 
of beliefs. At each step of the algorithm, using (|2.3p and (|2.4|) . we can 
compute the current beliefs 6-"' and 6a"' associated with the message m'"-'. 
The sequence m'"' will be said to be "6-convergent" when the sequences 
^^^^ and 6a"' converge. The term "simple convergence" will be used to refer 
to convergence of the sequence m'"' itself. Simple convergence obviously 
implies 6-convergence. We will first show that for a positive homogeneous 
normalization, 6-convergence and simple convergence are equivalent. We 
will then conclude by looking at b -convergence in a quotient space intro- 



duced in iMooii and KappenI (|2007l ) and we show the links between these 



two approaches. 

Proposition 4.1. For any positive homogeneous normalization Zai with 
continuous Nai, simple convergence and b-convergence are equivalent. 

Proof. Assume that the sequences of beliefs, indexed by iteration n, are such 
that 6a"' — )• 6a and 6-"' — )• 6, as n — )■ oo. The idea of the proof is first to 
express the normalized messages fh^aXi each step in terms of these beliefs, 
and then to conclude by a continuity argument. Starting from a rewrite of 
(I23D-(I22 



(n) 



one obtains by recombination 



where an arbitrary variable i € a has been singled out and 

1 de. n,ga^J-(^'"') 
Za^(m) Za{W-)) 
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Assume now that x^y^ is fixed and consider K.^^^ = -ftr^"'(xa\j; •) 
as a vector of M'^. Normalizing each side of the equation with a positive 
homogeneous function Nai yields 



Actually A^'^i [?Ti^"^j] = 1, since rh'^aXi been normalized by Nai and 
therefore 

~ (n) / \ ^ai (.'^a\ii ^i) 



This conclude the proof, since rh^aXi been expressed as a continuous 
function of 6^"' and &a"\ and therefore it converges whenever the beliefs 
converge. ■ 



We follow now an idea developed in lMooij and KappenI (j2007l ) and study 
the behavior of the BP algorithm in a quotient space corresponding to the 
invariance of beliefs. First we will introduce a natural parametrization for 
which the quotient space is just a vector space. Then it will be trivial to 
show that, in terms of 6-convergence, the effect of normalization is null. 

The idea of 6-convergence is easier to express with the new parametriza- 
tion : 

fJ-aiixi) = log ma-^i{Xi), 

SO that the plain update mapping (|2.7|) becomes 



Ki,x,ifJ-) = log 



Xa\i 



jea\i b3j 



We have e J\f = M''^''? and we define the vector space W which is the 
linear span of the following vectors {cai G J^}{ai)€E 



{s.ai)cj,Xj — l{ai=cj}- 

It is trivial to see that the invariance set of the beliefs corresponding to 
described in Lemma 13.11 is simply the affine space /i + W. So the 6- 
convergence of a sequence /i^") is simply the cor ivergenc e of m ^"-* in the 
quotient space M\W (which is a vector space, see iHalmosI ()l974l )). Finally 
we define the notation [x] for the canonical projection of x on A/" \ W. 

Suppose that we resolve to some kind of normalization on it is easy 
to see that this normalization plays no role in the quotient space. The 
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normalization on /u leads to iJ, + w with some w (zW. We have 

j£a\i bBj 

which can be summed up by 

[A(^ + W)] = [A(m)], (4.1) 
since I G W. We conclude by a proposition which is directly implied by 

Proposition 4.2. The dynamic, i.e. the value of the normalized beliefs at 
each step, of the BP algorithm with or without normalization is exactly the 
same. 

We will come back to this vision in term of quotient space in section 15.31 



5 Local stability of BP fixed points 



The question of convergence of BP has been addr e ssed ii i a series of works 



(jXatikonda and Jordan! . 120021 : iMooii and Kappenl . l2007l : llhler et all . l2005l ) 



which establish conditions and bounds on the MRF coefficients for having 
global convergence. In this section, we change the viewpoint and, instead of 
looking for conditions ensuring a single fixed point, we examine the different 
fixed points for a given joint probability and their local properties. 

In what follows, we are interested in the local stability of a message fixed 
point m with associated beliefs b. It is known that a BP fixed point is lo- 
cally attractive if the Jacobian of the relevant mapping (B or its normalized 
version) at this point has a spectral radius strictly smaller than 1 and unsta- 
ble when the spectral radius is strictly greater than 1. The term "spectral 
radius" should be understood here as the modulus of the largest eigenvalue 
of the Jacobian matrix. 

We will first show that BP with plain messages can in fact never con- 
verge when there is more than one cycle (Theorem 15. ip , and then explain 
how normalization of messages improves the situation (Proposition l5.21 The- 
orem 15.311 . 



5.1 Unnormalized messages 

The characterization of the local stability relies on two ingredients. The 
first one is the oriented line graph L{G) based on Q, which vertices are the 
elements of E, and which oriented links relate ai to a'j if j € a H a', j 7^ i 
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and a' ^ a. The corresponding 0-1 adjacency matrix A is defined by the 
coefficients 

The second ingredient is the set of stochastic matrices attached 
to pairs of variables having a factor node a in common, and which 

coefficients are the conditional beliefs, 



for all {k,l) £ {l,...,g}2. 

Using the representation ()2.10p of the BP algorithm, the Jacobian reads 
at this point: 

dQaLxii 

bi{Xi) ma'-,jiXj) 

Therefore, the Jacobian of the plain BP algorithm is — using a trivial 
change of variable — similar to the matrix J defined, for any pair {ai, k) and 
(a'j, ^) of E X {1, . . . , g} by the elements 

ja'j,e dcC Aiaj) .a'j 



This e xpression is analogous to the Jacobian encountered in lMooii and Kappen 



(|2007l ) . It is interesting to note that it only depends on the structure of the 
graph and on the belief corresponding to the fixed point. 

Since Q \s & singly connected graph, it is clear that A is an irreducible 
matrix. To simplify the discussion, we assume in the following that J is also 
irreducible. This will be true as long as the are always positive. It is easy 
to see that to any right eigenvector of A corresponds a right eigenvector 
of J associated to the same eigenvalue: if v = {vai,ai € E) is such that 

= Av, then the vector v"*", defined by coordinates v^'je '^a'j-, for all 
a'j e E and i S {1, ... ,(7}, satisfies Jv = Av. We will say that v+ is a 
A-based right eigenvector of J. Similarly, if u is a left eigenvector of A, with 
obvious notations one can define a A-based left eigenvector u"*" of J by the 
following coordinates: u^^^. = Uaibi{k). 

Using this correspondence between the two matrices, we can prove the 
following result. 

Theorem 5.1. // the graph Q has more than one cycle (C > 1), and the 

matrix J is irreducible, then the plain BP update rules i2.1l \2.2\) do not 
admit any stable fixed point. 
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Proof. Let vr be the ri ght Per r on ve ctor of A, which has positive entries, 
since A is irreducible (jSenetal . l2006l . Theorem 1.5). The ^-based vector 
tt" ^ also has pos itive coordinates and is therefore the right Perron vector of 
J (jSenetal . l2006l . Theorem 1.6); the spectral radius of J is thus equal to the 
one of A. 

When C > 1, Lemma [A . 1 1 implies that 1 is an eigenvalue of A associated 
to divergenceless vectors. However, such vectors cannot be non-negative, 
and therefore the Perron eigenvalue of A is strictly greater than 1. This 
concludes the proof of the theorem. ■ 



5.2 Positively homogeneous normalization 

We have seen in Proposition 14.11 that all the continuous positively homoge- 
neous normalizations make simple convergence equivalent to 6-convergence. 
As a result, one expects that local stability of fixed points will again depend 
on the beliefs structure only. Since all the positively homogeneous normal- 
ization share the same properties, we look at the particular case of Z^^^^{m), 
which is both simple and differentiable. We then obtain a Jacobian matrix 
with more interesting properties. In particular, this matrix depends not 
only on the beliefs at the fixed point, but also on the messages themselves: 
for the normalized BP algorithm (|2.8I with Z^^^^), the coefficients of the 
Jacobian at fixed point m with beliefs b read 



d 



m) 



which is again similar to the matrix J of general term 



1 



x=l 



1 



x=\ 



It is actually possible to prove that the spectrum of J does not depend 
on the messages themselves but only of the belief at the fixed point. 

Proposition 5.2. The eigenvectors of J are associated to eigenvectors of J 
with the same eigenvalues, except the A-based eigenvectors of J (including 
its Perron vector), which belong to the kernel of J. 

Proof. The new Jacobian matrix can be expressed from the old one as J = 
(I — M) J, where M is the matrix whose coefficient at row (ai, k) and column 
{a'j,£) is Il|(j=(j/ j=j|ma'_).j(£). Elementary computations yield the following 
properties of M: 
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• = M: M is a projector; 

• JM = 0. 

For any right eigenvector v of J associated to some eigenvalue A, 
J(v - Mv) = Jv = (I - M) Jv = A(v - Mv) 

so that V — Mv is a (right) eigenvector of J associated to A, unless v is an 
^-based eigenvector, in which case v = Mv and v is in the kernel of J. 

Similarly, if u is such that u^ J = Xu^ for A 7^ 0, then Au"^M = u^JM = 
and therefore \i^J = u^(I — M) J = u^J = any non-zero eigenvalue 

of J is an eigenvalue of J. This proves the last part of the theorem. ■ 

As a consequence of this proposition, when J is an irreducible matrix, 
J has a strictly smaller spectral radius: the net effect of normalization is 
to improve convergence (although it may actually not be enough to guar- 
antee convergence). To quantify this improvement of convergence related 
to message normalization, we resort t o classical argum ents used in speed 
convergence of Markov chains (see e.g. BremaudI ( 19991 )). 



The presence of the messages in the Jacobian matrix J complicates the 



evalua tion of this effect. However, it is known (see e.g. iFurtlehner et al. 
;hat it is possible to chose the functions (f) and ip as 



^i(Xi) = bi{Xi), V'a(Xa) = ^"(^«) (5^3) 

in order to obtain a prescribed set of beliefs 5 at a fixed point. Indeed, BP 
will admit a fixed point with ba = ha and hi = hi when ma^i{xi) = 1. Since 
only the beliefs matter here, without loss of generality, we restrict ourselves 
in the remainder of this section to the functions ()5.3p . Then, from ()5.2p . the 
definition of J rewrites 



Ja'j/ dof 
ai,k 



1 n 1 'J 

,{iaj) A"-':) - J'j'i.^ _ i: T"''^'^ 

kl ^ai — ai,k '^ai,x 



x=l 



g 

q ' 



For each connected pair (i,j) of variable nodes, we associate to the 
stochastic kernel U^*'^-') a combined stochastic kernel i^^*"^-?) i?(*"i)_BO'"*)j 
with coefficients 

m=l 

hmce 

l)ii)Q{iaj) ^ ^(j)^ ^j^g invariant measure associated to K: 
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and K^^""^^ is reversible, since 



m=l 
m=l 

Let second largest eigenvalue of iC^*"-') and let 

def I (iaj) I 1 

//2 = niax \ fj,2 1 2 . 

The combined effect of the graph and of the local correlations, on the 
stability of the reference fixed point is stated as follows. 

Theorem 5.3. Let Ai be the Perron eigenvalue of the matrix A 

(i) if XifJ-2 < 1; the fixed point of the normalized BP schema i2.8\ with 
Z^°^^) associated to b is stable. 

(a) condition (i) is necessary and sufficient if the system is homogeneous 
= B independent ofi, j and a), with H2 representing the second 
largest eigenvalue of B. 

Proof. See Appendix |B] ■ 

The quantity fi2 is representative of the level of m utual information be- 
tween , variables. It relates to the spectral gap (see e.g. iDiaconis and Strook 
(1991) for geometric bounds) of each elementary stochastic matrix B^'^°'^\ 
while Ai encodes the statistical properties of the graph connectivity. The 
bound A1//2 < 1 could be refined when dealing with the statistical average 
of the sum over path in (jB.ip which allows to define ^2 as 

^'=^'^oo^^^,){y^\ e ( n i^t'Y}- 



at, a' J 



5.3 Local convergence in quotient space J\f\yV 

The idea is to make the connexion between local stability of fixed point 
as described previously and the same notion of local stability but in the 
quotient space Af\W described in Section HI Trivial computation based on 
the results of Section 15.11 gives us the derivatives of A. 

d-^ai^XiifJ-) _ ^ij\a{Xii Xj) .bj _ jbj,Xj 
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In terms of convergence in Af\W, the stability of a fixed point is given by the 
projection of J on the quotient space M\W and we have (jMooii and KapperJ . 
20o3) : 

[J] [VA] = V[A] 

Proposition 5.4. The eigenvalues of [J] are the eigenvalues of J which 
are not associated with A-based eigenvectors. The A-based eigenvectors of J 
belong to the kernel of [J] 

Proof Let v be an eigenvector of J for the eigenvalue A, we have 

[Jv] = [Xv] = AM, 

so [v] is an eigenvector of [J] with the same eigenvalue A iff ] 7^ 0. The 
^-based eigenvectors (see Section 15. ip vu of J belongs to W so we have 

[w] = 0. 

It means that these eigenvectors of J have no equivalent w.r.t [J] and play 
no role in belief fixed point stability. ■ 

We have seen that the normalization Z^^^^ is equivalent to multiplying 
the jacobian matrix J by the projection I — M (Proposition [52]), with 

ker(I - M) = W. 

The projection I — M is in fact a quotient map from M to M \ W. So the 
normalization Z^^^^ is strictly equivalent, when we look at the messages 
nia^iixi), to working on the quotient space J\f \ W. More generally for 
any differentiable positively homogeneous normalization we will obtain the 
same result, the jacobian of the corresponding normalized scheme will be 
the projection of the jacobian J on the quotient space J\f\W, through some 
quotient map. 

6 Normalization in the variational problem 

Since Proposition 14.21 shows that the choice of normalization has no real 
effect on the dynamic of BP, it will have no effect on 6-convergence either. 
In this section, we turn to the effect of normalization on the underlying 
variational problem. It will be assumed here that the beliefs hi and ba are 
normalized (j2.5|) and compatible (|2.14|) . If only (j2.14p is satisfied, they will 
be denoted /3j and Pa- It is quite obvious that imposing only compatibility 
constraints leads to a unique normalization constant Z 

Z(/3)^='^ft(x,) = ^/3a(Xa), 
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which is not a priori related to the constants Za{m) and Zi{m) seen in 
the previous sections. The quantities f3i{xi)/Z{l3) and /3a{^a)/Z{/3) can be 
denoted as bi{xi) and 6a (xq) since ()2.5p holds for them. 

The aim of this section is to explicit the relationship between the min- 
imizations of the Bethe free energy ()2.15p with and without normalization 
constraints ()2.5p . Generally speaking, we can express them as a minimiza- 
tion problem V{E) on some set E as 

V{E) : argminF(/3) (6.1) 

I3£E 

where E is chosen as follows 

• plain case: E = Ei is the set of positive measures such as ()2.14p holds, 

• normalized case: E = E2 ^ Ei has the additional constraint (|2.5p . 

It is possible to derive a BP algorithm for the plain problem following the 
same path as in Section [2j The resulting update equations will be identical, 
except for the 7^ terms. 

The first step is to compare the solutions of (j6.ip on Ei and E2. Let 
be the bijection between Ei and E2 x M^, 

Lp:E2y.M.\ — ^ ^1 
(6, Z) — > hZ. 

The variational problem V{E\) is equivalent to 

(6, Z) = argmin Z)), 

(6,z)e£;2 

with (/?(&, Z) = bZ = ^ = argminF(/3). 

The next step is to express the Bethe free energy F{/3) of an unnor- 
malized positive measure /3 as a function of the Bethe free energy of the 
corresponding normalized measure b. 

Lemma 6.1. As soon as the factor graph is connected, for any (3 = Zb G Ei 

we have 

F{Zb) = Z{F{b) + {l-C)logZ), (6.2) 
with C being the number of independent cycles of the graph. 
Proof. 

F{(3) = F{Zb) 

= z [£ bM H-^) + M-^) H M^^) )\ 

= z(F(6) + (|F| + |Vl-|E|)logZ) 
= z(F(6) + (l-C7)logz), 
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where the last equahty comes from elementary graph theory (see e.g. iBergg 
(|l967l l). ■ 

The quantity 1 — C will be negative in the nontrivial cases (at least 2 
cycles). Since all the Zb are equivalent from our point of view, we look at 
the derivatives of F{Zb) as a function of Z to see what happens in the plain 
variational problem. 

Theorem 6.2. The normalized beliefs corresponding to the extrema of the 
plain variational problem V{Ei) are exactly the same as the ones of the 
normalized problem V{E2) as soon as C ^ 1. 

Proof. Using Lemma l6. II we obtain 

dF{l3) 
dZ 

the stationary points are 



F(6) + (l-C)(logZ + l), 



.C- 1 

At these points we can compute the Bethe free energy 
F(/3) = F(Z6) = (C7-l)exp(^-l 



exp ( - 1 ) . (6.3) 



G{F{b)). 



It is easy to check that, if C 7^ 1, G is an increasing function, so the extrema 
of F{I3) are reached at the same normalized beliefs. More precisely, if 61 and 
62 are elements of E2 such that F{bi) < ^(62) then F(/3i = Zibi) < F(/32 = 
Z262), which allows us to conclude. ■ 

In other words, imposing a normalization in the variational problem or 
normalizing after a solution is reached is equivalent as long as C 7^ 1. More- 
over, in the unnormalized case, the Bethe free energy at the local extremum 
writes 

F(6) = (C-l)(logZ + l). (6.4) 

We can therefore compare the "quality" of different fixed points by compar- 
ing only the normalization constant obtained: the smaller Z is, the better 
the approximation, modulo the fact that we're not minimizing a true dis- 
tance. 

When C = 1, it has been shown already in Section [3] that the normalized 
scheme is always convergent, whereas the plain scheme can have no fixed 
point. In this case, (|6.2p rewrites 



F(/3) = F{Zb) = ZF{b). 

The form of this relationship shows what happens: if the extremum of the 
normalized variational problem is strictly negative, F(/3) is unbounded from 
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below and Z will diverge to +00; conversely, if the extremum is strictly 
positive, Z will go to zero. In the (very) particular case where the minimum 
of the normalized problem is equal to zero, the problem is still well defined. 
In fact this condition i^(6) = is equivalent to the one of Theorem 13.21 when 
C = 1. 



To sum up, as soon as the plain variational problem is well defined, it is 
equivalent to the normalized one and the normalization constant allows to 
compute easily the Bethe free energy using ()6.4p . When this is no longer the 
case, we still know that the dynamics of both algorithms remain the same 
(Proposition 14. 2p but the plain variational problem (which can still converge 
in terms of beliefs) will not converge in terms of normalization constant Z, 
and we have no more easy information on the fixed point free energy. 



As emphasized previously, the relationship between Z, Za{m) and Zi{m) 
is not trivial. In the case of the plain BP algorithm, for which Za{m) = 
Zi{m), an elementary computation yields the following relation at any fixed 
point 

F{b) = {C -1) log Za{m), 



which seemingly contradicts (j6.4p . In fact, the algorithm derived from the 
plain variational problem is not exactly the plain BP scheme. Usually, since 
one resorts to some kind of normalization, the multiplicat i ve co nstants of 
the fixed point equations are discarded (see lYedidia et al.l (j2005l ) for more 
details). Keeping track of them yields 



i{xi) = exp 



di 



(6.5) 



/3a(Xa) = -V'a(Xa) JJ%-_^a(xj), 



I3i{xi) = (piixi) 



exp Y\'^b^ii^i)- 



Actually, the plain update scheme ()2.1l2.2p corresponds to some constant 
normalization exp {^-E^ ■ Without any normalization, using (|6.5p as update 
rule, one would obtain 

Z = ^ = Z,(m)exp (^^J. 



20 



7 Conclusion 



This paper motivation was to fill a void in the literature about the effect of 
normalization on the BP algorithm. What we have learnt can be summarized 
in a few main points 

• using a normalization in BP can in some rare cases kill or create new 
fixed points; 

• not all normalizations are created equal when it comes to message 
convergence, but there is a big category of positive homogeneous nor- 
malization that all have the same effect; 

• the user is ultimately concerned with convergence of beliefs, and thank- 
fully the dynamic of normalized beliefs is insensitive to normalization. 

The messages having no interest by themselves, it is worthy of remark 
that combining the update rules ()2.12p recalled below 



and the definition ()2.3p and (j2.4p of beliefs, one can eliminate the messages 
and obtain 



One particularity of these update rules is that they do not depend on the 
functions tp oi (p but only on the graph structure. The dependency on the 
joint law (jl.ip occurs only through the initial conditions. This "product 
sum" algorithm therefore shares common properties for all models build on 
the same underlying graph, and the initial conditions should impose the 
details of the joint law. To our knowledge this algorithm has never been 
studied and we let it for future work. 

A Spectral properties of the factor graph 

This appendix is devoted to some properties of the matrix A defined in ()5.ip 
that are used in Sections E] and O 

We consider two types of fields associated to Q, namely scalar fields and 
vector fields. Scalar fields are quantities attached to the vertices of the graph, 



bi\a{xi) 



bi{xi) ^ bi{xi) TT 

Oi{Xi 




bi{xi) 
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while vector fields are attached to its edges. A vector field w = {wai, ai G E} 
is divergenceless if 

Va G F, ^ Wai = and G V, ^ Wai = 0. 

i6a aBi 

A vector field u = {uai, ai G E} is a gradient if there exists a scalar field 
{ua, Ui, a G F, i E V} such that 

Vm G E, -Uai = Ua - Ui. 

There is an orthogonal decomposition of any vector field into a diver- 
genceless and a gradient component. Indeed, the scalar product 

W^U = ^ WaiUai = XI XI ~ XI XI 
aieE ae¥ i£a ieY aBi 

is for all gradient fields u iff w is divergenceless. Dimensional considera- 
tions show that any vector field v can be decomposed in this way. 

In the following, it will be useful to define the Laplace operator A asso- 
ciated to Q. For any scalar field u: 

(Au)a = daUa - X ^""^^ (^"l) 

(Au), = dm - X e ^- (^-2) 

aBi 

The following lemma describes the spectrum of A in terms of a Laplace 
equation on the graph Q. 

Lemma A.l. (i) Both gradient and divergenceless vector spaces are A- 
invariant and divergenceless vectors are eigenvectors of A with eigenvalue 
1. (ii) eigenvectors associated to eigenvalues A 7^ 1 are gradient vectors of 
a scalar field u which satisfies 

(Au)^ = ^ (1 _ 

and there exists a gradient vector associated to 1 iff Q has exactly one cycle 
(C = 1). 

Proof. The action of ^ on a given vector x reads 

X/ "^ai^a'j ~ X/ '^'^'-^ ~ ^"■' ) ~ 5-/ 

a'jGE jea a'Bj a'Bi 

The first two terms in the second member vanish if x is divergenceless. In 
addition, the first term in parentheses is independent of i while the second 
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one is independent of a so the first assertion is justified. We concentrate 
then on solving the eigenvalue equation Ax — Ax = for a gradient vector 
X, with Xai = Ua — Ui. Ax — Ax is the gradient of a constant scalar X S R, 
and by identification we have 

{Au). = il- X)ui + K. 

The Laplacian of a constant scalar is zero, so for X ^ 1, K may be reabsorbed 
in u and, combining these two equations with the help of identities ()A.l|A.2p . 
yields equation ()A.3p . For A = 1, we obtain 

{Au)^ = {l -da)K and {Au)^ = K. (A.4) 

Let D be the diagonal matrix associated to the graph Q, whose diagonal 
entries are the degrees da and di of each node. M = I — D~^A is a stochastic 
irreducible matrix, which unique right Perron vector (1, . . . , 1) generates the 
kernel of A. As a result, for K = 0, the solution to ()A.4p is Ua = Ui = cte 
so that Xai = 0. 

For K ^ 0, there is a solution if the second member of ()A.4p is orthogonal 
(A is a symmetric operator) to the kernel. The condition reads 

= - d,) + 1 = |FI - |E| + |V| = 1 - C, 



where the last equality comes from elementary graph theory (see e.g. iBerge 

Since 1 is an eigenvalue of A, it is interesting to investigate linear equa- 
tions involving I — A. Since it is already known that divergenceless vectors 
are in the kernel of this matrix, we restrict ourselves to the case where the 
constant term is of gradient type. 

Lemma A. 2. For a given gradient vector field y, the equation 

(I-A)x = y, 

has a solution (unique up to a divergenceless vector) iff C ^ 1 or C = 1 and 



+ V(l - di)y^ = 0. (A.5) 



Proof. We look here only for gradient-type solutions Xai = Ua — Ui and write 
Uai = Ua — Ui- Owing to the same arguments as in Lemma lA. 11 there exists 
a constant K such that 

[Am] ^ = K{da - I) + Va - Y^yj 

{Au) ^ = y,-K. 
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Stating as before the compatibility condition for this equation yields 
Y^y, + Y^{l-d,)y, = K{C-l). 

aGF ieV 

It is always possible to find a suitable K as long as C 7^ 1 and when C = 1, 
()A.5P has to hold. ■ 



B Proof of Theorem 15.3 



Let us start with (ii): when the system is homogeneous, J is a tensor product 
of A with and its spectrum is therefore the product of their respective 
spectra. In particular if Q has uniform degrees da and dj, the condition reads 

^Ji2{da - l){di - I) < 1. 

In order to prove part (i) of the theorem, we will consider a local norm 
on M'^ attached to each variable node i, 

IkllftW = {jZ^lf^kJ and = 

fc=l k=l 

the local average of X G ]R9 w.r.t For convenience we will also consider 
the somewhat hybrid global norm on R^'^l'^l 

II II — ' II II 

a—^i 

where tt is again the right Perron vector of A, associated to Ai. 
We have the following useful inequality. 



Lemma B.l. For any {xi,Xj) € M?'^, such that (xj)^(i) = and Xj^ib^ 

{^j)b(J) ~0 and W^jWlu) ''^ll^illMi)- 
Proof. By definition ()5.4p . we have 

k=i "k i=i 
e,m, 

Since K^^"'^') is reversible we have from Rayleigh's theorem 

^(...) ^^*-,up{£M|^:^^f|^, =0,x^ 0}, 

Ljk^k^k 

which concludes the proof. 
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To deal with iterations of J, we express it as a sum over paths. 

W )ai,k Jai \^ai,a'jJkV 

where -B^^'^,^- is an average stochastic kernel, 

(„) def ]_ y T-r (^^y) , . 

' ai,a'j 

^aia'j represents the set of directed path of length n joining ai and a'j on 

L{Q) and its cardinal is precisely |r^"'^,^. | =(A")^/. 

Lemma B.2. For any {xai,Xa>j) S I^^'^, such that (xj)^(i) = and 

k 

the following inequality holds 

Proof. Let x^^,- the contribution to Xa'j corresponding to the path 7 € L^"'^,^.. 
Using Lemma [B?T] recursively yields for each individual path 

ll^I'jllfe(i) ^ ll^ai llfe(») ) 

and, owing to triangle inequality, 

II II ^ 1 II T II ^ '^ll II 

\\Xa'j\\bU) ^ T 2^ iFa'jllfc(j) ^ ^2 Ipai • 

■ 

It is now possible to conclude the proof of the theorem. 

Proof of Theorem \5.3\( i). (i) Let v and v' two vectors with v' = vJ" = 
v(I — M)J^, {M is the projector defined in Proposition I5.2p since JM = 0. 
Recall that the effect of (I — M) is to first project on a vector with zero local 
sum, ^;,(v(I — Af ))^. = 0, Vi S V, so we assume directly v of the form 



Vai,k = Xai,kbk\ with {Xai)b(')=0- 



As a result v' = vJ" = v'(I — M) is of the same form. Let a^^'j^ "^'a'jtl^^P- 
We have ^ 

a'— s>j a— 

with ya'j^i h^P = Y.k^ai,kbf {B'^^)^^, X^. From Lemma [R2] applied to ya'j, 



\x'" 



A\n,b < 7ra'iX](^")a/^2lkai|lbW = \l^X2\\xh,b, 

since tt is the right Perron vector of A. 
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